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Abstract 

We establish that the feedback capacity of the trapdoor channel is the logarithm of the golden ratio and provide 
a simple communication scheme that achieves capacity. As part of the analysis, we formulate a class of dynamic 
programs that characterize capacities of unifilar finite-state channels. The trapdoor channel is an instance that admits 
| a simple analytic solution. 
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I. Introduction 

David Blackwell, who has done fundamental work both in information theory and in stochastic dynamic 
programming, inttoduced the trapdoor channel in 1961 [1] as a "simple two-state channel". The channel is depicted 
in Figure Q an d a detailed discussion of this channel appears in the information theory book by Ash [2], where 
indeed the channel is shown on the cover of the book. 

The channel behaves as follows. Balls labeled '0' or '1' are used to communicate through the channel. The 
channel starts with a ball already in it. To use the channel, a ball is inserted into the channel by the transmitter, 
and the receiver receives one of the two balls in the channel with equal probability. The ball that does not exit the 
channel remains inside for the next channel use. 
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Fig. 1. The trapdoor(chemical) channel. 

Another appropriate name for this channel is chemical channel 1 . This name suggests a physical system in which 
the concentrations of chemicals are used to communicate, such as might be the case in some cellular biological 
systems. The transmitter adds molecules to the channel and the receiver samples molecules randomly from the 
channel. The trapdoor channel is the most basic realization of this type of channel; it has only two types of 
molecules and there are only three possible concentrations, (0, 0.5, 1), or alternatively only one molecule remains 
in the channel between uses. 

Although the trapdoor channel is very simple to describe, its capacity has been an open problem for 45 years 
[1]. The zero-error capacity was found by Ahlswede et al. [3], [4] to be 0.5 bits per channel use. More recently, 
Kobayashi and Morita [5] derived a recursion for the conditional probabilities of output sequences of length n given 
the input sequences and used it to show that the capacity of this channel is strictly larger than 0.5 bits. Ahlswede 
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and Kaspi [3] considered two modes of the channel called the permuting jammer channel and the permuting relay 
channel. In the first mode there is a jammer in the channel who attempts to frustrate the message sender by selective 
release of balls in the channel. In the second mode, where the sender is in the channel, there is a helper supplying 
balls of a fixed sequence at the input, and the sender is restricted to permuting this sequence. The helper collaborates 
with the message sender in the channel to increase his ability to transmit distinct messages to the receiver. Ahlswede 
and Kaspi [3] gave answers for specific cases of both situations and Kobayashi [6] established the answer to the 
general permuting relay channel. More results for specific cases of the permuting jammer channel can be found in 

in [8]. 

In this paper we consider the trapdoor channel with feedback. We derive the feedback capacity of the trapdoor 
channel by solving an equivalent dynamic programming problem. Our work consists of two main steps. The first 
step is formulating the feedback capacity of the trapdoor channel as an infinite-horizon dynamic program, and the 
second step is finding explicitly the exact solution of that program. 

Formulating the feedback capacity problem as a dynamic program appeared in Tatikonda's thesis [9] and in work 
by Yang, Kavcic and Tatikonda [10], Chen and Berger [11], and recently in a work by Tatikonda and Mitter [12]. 
Yang et. al. [10] have shown that if a channel has a one-to-one mapping between the input and the state, it is 
possible to formulate feedback capacity as a dynamic programming problem and to find an approximate solution by 
using the value iteration algorithm [13]. Chen and Berger [11] showed that if the state of the channel is a function 
of the output then it is possible to formulate the feedback capacity as a dynamic program with a finite number of 
states. 

Our work provides the dynamic programming formulation and a computational algorithm for finding the feedback 
capacity of a family of channels called unifilar Finite State Channels (FSC's), which include the channels considered 
in [10], [11]. We use value iteration [13] to find an approximate solution and to generate a conjecture for the exact 
solution, and the Bellman equation [14] to verify the optimality of the conjectured solution. As a result, we are able 
to show that the feedback capacity of the trapdoor channel is logcj), where </> is the golden ratio, ■ In addition, 
we present a simple encoding/decoding scheme that achieves this capacity. The remainder of the paper is organized 
as follows. Section HU defines the channel setting and the notation throughout the paper. Section |lll] states the main 
results of the paper. Section IIVI presents the capacity of a unifilar FSC in terms of directed information. Section 
IV! introduces the dynamic programming framework and shows that the feedback capacity of the unifilar FSC can 
be characterized as the optimal average reward of a dynamic program. Section IVT1 shows an explicit solution for 
the capacity of the trapdoor channel by using the dynamic programming formulation. Section IVHI gives a simple 
communication scheme that achieves the capacity of the trapdoor channel with feedback and finally Section IVIIII 
concludes this work. 

II. Channel Models and Preliminaries 

We use subscripts and superscripts to denote vectors in the following ways: x J = (x\ . . . Xj) and x\ — [xi . . .Xj) 
for i < j. Moreover, we use lower case x to denote sample values, upper case X to denote random variables, 
calligraphic letter X to denote the alphabet and \X\ to denote the cardinality of the alphabet. The probability 
distributions are denoted by p when the arguments specify the distribution, e.g. p(x\y) = p(X = x\Y = y). In 
this paper we consider only channels for which the input, denoted by {X±, X2, ■■■}, and the output, denoted by 
{Yi, Y2, ...}, are from finite alphabets, X and y, respectively. In addition, we consider only the family of FSC 
known as unifilar channels as considered by Ziv [15]. An FSC is a channel that, for each time index, has one of 
a finite number of possible states, St-i, and has the property that p(yt, st|x', s t_1 , y*" 1 ) = p(yt, st\%t, A 
unifilar FSC also has the property that the state s t is deterministic given (s t -i,x t ,yt)' 

Definition 1: An FSC is called a unifilar FSC if there exists a time-invariant function /(•) such that the state 
evolves according to the equation 

s t = f(st-i,x t ,Vt). (1) 
We also define a strongly connected FSC, as follows. 

Definition 2: We say that a finite state channel is strongly connected if for any state s there exists an integer 
T and an input distribution of the form {p(x t \st-i)}f = i sucn that the probability that the channel reaches state s 
from any starting state s', in less than T time-steps, is positive. I.e. 

T 

^2 Pr(St = s\S = s') > 0, Vs e S, Vs' e S. (2) 

t=i 
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Fig. 2. Unifilar FSC with feedback 



We assume a communication setting that includes feedback as shown in Fig. [2] The transmitter (encoder) knows 
at time t the message m and the feedback samples y 1 ^ 1 . The output of the encoder at time t is denoted by Xt 
and is a function of the message and the feedback. The channel is a unifilar FSC and the output of the channel y t 
enters the decoder (receiver). The encoder receives the feedback sample with one unit delay. 

A. Trapdoor Channel is a Unifilar FSC 

The state of the trapdoor channel, which is described in the introduction and shown in figure Q is the ball, or 
1, that is in the channel before the transmitter transmits a new ball. Let x% <E {0, 1} be the ball that is transmitted 
at time t and s t _i £ {0, 1} be the state of the channel when ball Xt is transmitted. The probability of the output 
yt given the input Xt and the state of the channel St-i is shown in table U 

TABLE I 

The probability of the output yt given the input x t and the state st-i. 
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p(y t = 0\x t , st-i) 


p(yt = i\x t ,s t -i) 
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The trapdoor channel is a unifilar FSC. It has the property that the next state St is a deterministic function of 
the state s t _i, the input x t , and the output y t . For a feasible tuple, (xt, yt, st-i), the next state is given by the 
equation 

st = s t -i @x t ®yt, (3) 

where denotes the binary XOR operation. 



B. Trapdoor Channel is a Permuting Channel 

It is interesting to note, although not consequential in this paper, that the trapdoor channel is a permuting channel 
[16], where the output is a permutation of the input (Fig. [3}- At each time t, a new bit is added to the sequence 
and the channel switches the new bit with the previous one in the sequence with probability 0.5. 

III. Main Results 
• The capacity of the trapdoor channel with feedback is 

C = \og — - — . (4) 



4 



0.5 ().5 0.5 0.5 0.5 0.5 

<l v <• i f vt i ) v* v 




Fig. 3. The trapdoor channel as a permuting channel. Going from left to right, there is a probability of one half that two adjacent bits switch 
places. 



Furthermore, there exists a simple capacity achieving scheme which will be presented in Section IVIII 

• The problem of finding the capacity of a strongly connected unifilar channel (Fig. |2j can be formulated as an 
average-reward dynamic program, where the state of the dynamic program is the probability mass function 
over the states conditioned on prior outputs, and the action is the stochastic matrix p(x\s). By finding a solution 
to the average-reward Bellman equation we find the exact capacity of the channel. 

• As a byproduct of our analysis we also derive a closed form solution to an infinite horizon average-reward 
dynamic program with a continuous state-space. 



IV. The Capacity Formula for a Unifilar Channel with Feedback 

The main goal of this section is to prove the following theorem which allows us to formulate the problem as a 
dynamic program. 

Theorem 1: The feedback capacity of a strongly connected unifilar FSC when initial state so is known at the 
encoder and decoder can be expressed as 



1 N 

C FB = sup Uminf — ^/(Xt.Si-xjytlF*- 1 ) (5) 

where {p(xt \st~i, y t_1 )}t>i denotes the set of all distributions such that p(xt\y t ~ 1 , x 1 ^ 1 , s* -1 ) = p(xt\st-i,y t ~ 1 ) 
for t = 1, 2, ... . 

Theorem \l\ is a direct consequence of Theorem [3] and eq. i26\ in Lemma [4] which are proved in this section. 
For any finite state channel with perfect feedback, as shown in Figure [2] the capacity was shown in [17], [18] 
to be bounded as 

lim — max maxI(X N ^ Y N \s ) > C FB > lim — max minI(X N -> Y N \s ). (6) 

N— >co N p(x N \\y N - 1 ) s n JV— oo 7Y p(x N \ \y N ~ i) s 

The term I(X N — > Y ) is the directed information 2 defined originally by Massey in [25] as 

N 

I{X N ~>Y N )^^I{X t -Y t \Y t - 1 ). (7) 

t=\ 

The initial state is denoted as so and p(x N \\y N ~ 1 ) is the causal conditioning distribution defined [17], [22] as 

N 

p(x N \\y N - 1 )^l[p(x t \x t -\y t - 1 ). (8) 
t=i 

The directed information in eq. © is under the distribution of p(x n , y n ) which is uniquely determined by the causal 
conditioning, p(x N \ \y N ~ 1 ), and by the channel. 

In our communication setting we are assuming that the initial state is known both to the decoder and to the 
encoder. This assumption allows the encoder to know the state of the channel at any time t because s t is a 
deterministic function of the previous state, input and output. In order to take into account this assumption, we use 
a trick of allowing a fictitious time epoch before the first actual use of the channel in which the input does not 
influence the output nor the state of channel and the only thing that happens is that the output equals so and is 



2 In addition to feedback capacity, directed information has recently been used in rate distortion [19], [20], [21], network capacity [22], [23] 
and computational biology [24]. 



5 



fed back to the encoder such that at time t = 1 both the encoder and the decoder know the state sq. Let t = 
be the fictitious time before starting the use of the channel. According to the trick, Yq = So and the input Xq can 
be chosen arbitrarily because it does not have any influence whatsoever. For this scenario the directed information 
term in eq. becomes 

(9) 
(10) 



I(X» ^Y»\s )=I(X 



Y 



N 



so) 



The input distribution becomes 



P (x»\\{s ,y N - 1 })=p(x N \\y N - 1 



,sa) 



' 1 y 1 1 ,sq). Therefore, the capacity of a 
channel with feedback for which the initial state, sq, is known both at the encoder and the decoder is bounded as 



where p(x N \\y N 1 ,sq) is defined as p(x N \\y N 1 ,sq) — Ilt=iP( a; *l a; 



lim — max maxI(X 

N^oo N p(x N \\y N - 1 ,s ) s 



N 



Y N \s Q )>C FB > lim 1 



max 



minI(X N -» Y ly \s Q ) (11) 



N] 



N^co N p(x N \\y N ~ 1 ,s ) s o 



Lemma 2: If the finite state channel is strongly connected, then for any input distribution p\ (x \\y , so) an d 
any s' Q there exists an input distribution p%(x \\y N : s' ) such that 



— (X 

N ' 



N 



Y N \s )-I P2 (X" ^Y 



N 



4)1 < - 

o>\ - N 



(12) 



where c is a constant that does not depend on N, sq, s' . The term I P1 (X N — > Y N \so) denotes the directed 
information induced by the input distribution pi(x N \\y N ~ 1 , sq) where so is the initial state. Similarly, the term 
I P2 (X N — > F^Isq) denotes the directed information induced by the input distribution P 2(x N \\y , s' ) where s' Q 
is the initial state. 



Proof: Construct P 2(x N \\y N , s' ) as follows. Use an input distribution, which has a positive probability of 
reaching sq in T time epochs, until the time that the channel first reaches sq. Such an input distribution exists 
because the channel is strongly connected. Denote the first time that the state of the channel equals sq by L. 
After time L, operate exactly as p% would (had time started then). Namely, for t > L, p2(xt\x 



t-i „,«-! 



y 



Pxixt-L^ 


-L-l t yt 


-^-^so). Then 






— \lp.{X N — > Y N 
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s )-I P2 (X N ^Y N \s' 


)| 




(a) 1 
< — 

~ N 


i P1 (x N 


-^Y N \s )-I P2 (X N - 


Y N \L,s' )\ 


+ —H(L) 

N y J 


(fa) 1 
N 


oo 

1=1 


= Z)7 Pl (X Jv ^F Ar | So )- 


OO 


I) (I P2 (X» Y^lsO+IpAX 1 - Y l \ Sl ,s' )) 


(c) 1 

< — 

- N 


oo 

1=1 


= l)I Pl (X N ^Y N \s Q )- 


oo 

1=1 


l)I P2 (X? ^Y t N \ Sl ) 
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"TV 



J2p(L = 1)I P2 (X 1 ^Y 1 \s 1iS ' ) 



i=i 



—H(L) 

N V ' 



< 



N 



2 °° 1 

^5>(i = 0iiGg|y| + ^tf(£) 

±(log\y\E[L]+H(L)) 



N 



H(L) 



(13) 



(a) from the triangle inequality and Lemma 3 in [17] which claims that for an arbitrary random variables 
[X N , Y N ', S), the inequality \l(X N -> Y N ) - I(X N -> Y N \S)\ < H(S) always holds. 

(b) follows from using the special structure of P2(x \\y N , s' Q ). 

(c) follows from the triangle inequality. 

(d) follows from the fact that in the first absolute value N — / terms cancel and therefor only I terms remain where 
each one of them is bounded by I(X t ;Y t \Y t ^ 1 ) < \y\. In the second absolute value there are I terms also 
bounded by |^|. 
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The proof is completed by noting that H(L) and E(L) are upper bounded respectively, by H (L) and E(L) where 
\_LjT\ ~ Geometric(p) and p is the minimum probability of reaching sq in less than T steps from any state s E S. 
Because the random variable \L/T\ has a geometric distribution, H(L) and E[L] are finite and, consequently, so 
are H(L) and E(L). ■ 

Theorem 3: The feedback capacity of a strongly connected unifilar FSC when initial state is known at the encoder 
and decoder is given by 

1 N 
C FB = lim — max I{X t , Si_i; Yt|y' _1 ), (14) 

Proof: The proof of the theorem contains four main equalities which are proven separately. 

C FB = lim — max min I(X N -► Y N \s ) (15) 

JV^oo A p(x N \\y N - 1 ,s ) so 

= lim— max /(I* -> y^lSo) (16) 

JV^oo TV p(x N \\y N - 1 ,s a ) 

1 * 

= lim — max V I{X t , S t -i; Y^Y*' 1 ) (17) 

JV^oo A p(x N \\y N -\s ) f-J 

1 * 

- lim - max V 7(X t , 5 t _n Ftl^- 1 ). (18) 

tv^oo TAT {p^Mt-i.v*- 1 )}^ fri 

Proof of equality | |75I ) ana? 1 176) : As a result of Lemma [2] 

lim 4 max -> F^ISo) = lim -J- max p(fl )/(A" w -»■ Y^Uo) 

AT-i-oo /y p(x^||i/W-l jao ) N^oo A p(x N \\y N ~ 1 ,s ) 



(b) 



lim — pfsn) max 7(Jf — > y 
N^oo A — ' p(x N \\y N - 1 ,s ) 

so 



N | \ 

so) 



( = } lim —min max I(X N ^Y N \s ) (19) 

W^oo AT s p(x N \\y N - 1 ,s ) 

= lim — max minI(X N -+Y N \s ). (20) 

AT — >oo X p(x N \ \y N ~ 1 ,sq) so 



where, 



(a) follows from the definition of conditional entropy. 

(b) follows from exchanging between the summation and the maximization. The exchange is possible because 
maximization is over causal conditioning distributions that depend on so . 

(c) follows from Lemma |2] 

(d) follows from the observation that the distribution p*(x 7V ||y Ar_1 , so) that achieves the maximum in 09) and in 
( l20t is the same: p* (x N \\y N ~ 1 , s ) = argmaxwa-jvii^jv-i^N I(X N — > F^so). This observation allows us to 
exchange the order of the minimum and the maximum. 

Equations dl9l and i2Q\ can be repeated also with max So instead of min So and hence we get 

lim — max I(X N -> Y N \S ) = lim — max maxI(X N -> F^Isq)- (21) 

7Yp(a:W||^-i,ao) JV^oo A p(a; N ||iy«- 1 ,so) s 

By using eq. ( 1201 and (12 1 i . we get that the upper bound and lower bound in il It are equal and therefore eq. 
(Q3J and (IH hold. 
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Proof of equality M 71 : Using the property that the next state of the channel is a deterministic function of the 
input, output and current state we get, 

N 

i(x n ^y n \s ) = £) I (* t ; y tl y *~ 1 > 5 °) 



i=l 

JV 



(a) 



(6) 



J2 H{Y t \Y l -\ So) ~ H{Y t \X\ Y l ~\ S ) 
t=i 

N 

Y / H(Y t \Y t ' 1 ,So)~H(Y t \X t ,Y^\So,S t - 1 {X t 1 Y t - 1 ,So)) 
t=i 

N 

Y, HiYtlY*- 1 , So) - H(Y t \X t , Si-i.y*" 1 , S ) 



t=l 

N 



= YWt-i'XtW**' 1 ' 15 *)- (22) 

t=l 

Equality (a) is due to the fact that s* _1 is a deterministic function of the tuple (x*, y t_1 , sq). Equality (b) is due to 
the fact that p{y t \x t , y t_1 , s*" 1 , sq) — p(yt\xt, 2/* 1 , St-i, sq). By combining eq. (1161 and eq. (I22> we get eq. d!7l >. 



Proof of equality U8t : It will suffice to prove by induction that if we have two input distributions 
{px(x t \x t ~ 1 , y*~\ s )}t>i and {p 2 (x t \x t ^ 1 , s )}t>i that induce the same distributions {p(x t \s t -i, y t_1 )}t>i 
then the distributions {p(s{_i, rEt, y*)}t>i are equal under both inputs. First let us verify the equality for t = 1: 

K s o,^i,J/i) =p(so)p(xi|so)p(yi|so,a;i). (23) 

Since p(so) and p(yi|so,xi) are not influenced by the input distribution and since p(xi\so) is equal for both input 
distributions then p(so, xi,yi) is also for both input distributions. Now, we assume that p(st-i, xt, y*) is equal under 
both input distributions and we need to prove that p(st, xt+i, y t+1 ) is also equal under both input distributions. 
The term p(st, xt+i, y t+1 ) which can be written as, 

p(s t ,x t+ i,y t+1 ) = p(s t ,y t )p(xt + i\st,y t )p(yt+i\x t +i,s t ). (24) 

First we notice that if p(s t -i,Xt,y t ) is equal for both cases then necessarily p(st-x, St, Xt,y l ) is also equal for 
both cases because s t is a deterministic function of the tuple (st-i,Xt,yt) and therefore both input distributions 
induce the same p(st, y 1 ). The distribution, p(xt+i\st, y* ), is the same under both input distributions by assumption 
and p(yt+i\xt+i, St) does not depend on the input distribution. ■ 

The next lemma shows that it is possible to switch between the limit and the maximization in the capacity 
formula. This is necessary for formulating the problem, as we do in the next section, as an average-reward dynamic 
program. 

Lemma 4: For any FSC the following equality holds: 

lim — max min I(X N -> Y N \s ) = sup liminf — mmI(X N Y N \s ). (25) 

W^oo N p( x »\\y"-\s ) s {PCxtl^-V-Vo)}.^ N ^°° N So 

And, in particular, for a strongly connected unifilar FSC 

N N 

lim — max V I(X t , S t -i; Y^- 1 ) = sup liminf — V I{X U S t -i] Y^Y 1 " 1 ) 

(26) 

On the left-hand side of the equations appears lim because, as shown in [18], the limit exists due to the super- 
additivity property of the sequence. 

Proof: We are going to prove eq. (125 \ which hold for any FSC. For the case of unifilar channel, the left-hand 
side of eq. (125 \ is proven to be equal to the left side of eq. i26\ in eq. J 1 5I >- J 1 8I >. By the same arguments as in 
(IT5l-(fT8l also the right-hand side of (1251 and ( l26l are equal. 
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Define 



C N = — max mmI(X N -> Y N \s ). 

N p(a;™||iy™- 1 ,so) «o 



(27) 



In order to prove that the equality holds we will use two properties of C_ N that were proved in [18, Theorem 



13]. 



The first property, is that C_ N is a super additive sequence, namely, 



N 



log \S\ 



N 



> n 



C, 



log |.S| 



+ 1 



log 1^1 



The second property, which is a result of the first, is that 



Now, consider 

lim — max mva.I{X N 

N^oo N p(x N \\y N - 1 ,s Q ) s 



lim C N = sup C_ 



Y N \s ) 



N 



supC N 

N 



■N 



(28) 



(29) 



= sup 



> 



N N p(x" u 
1 



max min/(X iv ^ y JV |s ) 

W|l„N-l iSo ) So 



sup 



{p(x t \y t 1 ,x* 1 ,s )} t >i 



1 



sup 



sup-^min/p^ -» F iv |s ) 



rN | 



{p^ll/*- 1 ^*- 1 ,^)}*^! ^ 



TV 



sup 

{p(x*|y*- 1 ,x*- 1 ,s )}t>i 



1 



liminf — mini (X iv -» r iv |s )(30) 

-/V N s 



The limit of the left side of the equation in the lemma implies that, Ve > there exists N(e) such that for 

all n > N(e), ^ maXp( x n^ y n-i iSo -) min So I(X N — > F^so) > supjy Cjv — e. Let us choose j > iV(e) and let 
p*(a^'||j/ J ' _1 ) be the input distribution that attains the maximum. Let us construct 



p{x l \\y l \ s ) = P* I ll/J-j+i. *t-;)P* (^-i+i I l^-iy+i . *t-«)- 



(31) 



Then we get, 



sup liminf — mmI(X N -> Y N \s ) > liminf — mmI p (X N -> Y N \s ) > sup Cat - e (32) 

where Ip(X N — > F^lso) is the directed information induced by the input p(a:*||?/* _1 , s ) an d the channel. The left 
inequality holds because p(x t \\y t ~ 1 , s ) is only one possible input distribution among all {p(x t \ |y t_1 , So)}^l 1 . The 
right inequality holds because the special structure of p(x t 1 1 y*~ 1 , so ) transforms the whole expression of normalized 
directed information into an average of infinite sums of terms that each term is directed information between blocks 
of length j. Because for each block the inequality holds, then it holds also for the average of the blocks. The 
inequality may not hold on the last block, but because we average over an increasing number of blocks its influence 
diminishes. 

■ 

V. Feedback Capacity and Dynamic Programming 

In this section, we characterize the feedback capacity of the unifilar FSC as the optimal average-reward of a 
dynamic program. Further, we present the Bellman equation, which can be solved to determine this optimal average 
reward. 



A. Dynamic Programs 

Here we introduce a formulation for average-reward dynamic programs. Each problem instance is defined by a 
septuple (Z,U, W, F, P z ,P w ,g). We will explain the roles of these parameters. 
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We consider a discrete-time dynamic system evolving according to 

zt = F(zt-i,ut, w t ), t= 1,2,3,..., (33) 

where each state z t takes values in a Borel space Z, each action u t takes values in a compact subset U of a 
Borel space, and each disturbance w t takes values in a measurable space W. The initial state zo is drawn from 
a distribution P z . Each disturbance wt is drawn from a distribution P w {-\zt-i, ui) which depends only on the 
state zt-i and action ut- All functions considered in this paper are assumed to be measurable, though we will not 
mention this each time we introduce a function or set of functions. 

The history h t — (zo,wq,... ,Wt-i) summarizes information available prior to selection of the tth action. The 
action u t is selected by a function p, t which maps histories to actions. In particular, given a policy 7r = {p,i,p,2, • ■ •}, 
actions are generated according to u t = pt(ht)- Note that given the history h t and a policy n — {pi,p 2 , • ■ •}, 
one can compute past states z\, ... , Zt-i and actions u\, . . . , Ut-i. A policy ir — {fi\, p,2, . . .} is referred to as 
stationary if there is a function p : Z U such that pt{h t ) — \i(zt-\) f° r all t and h t . With some abuse of 
terminology, we will sometimes refer to such a function p itself as a stationary policy. 

We consider an objective of maximizing average reward, given a bounded reward function g : Z x U — > SR. The 
average reward for a policy 7r is defined by 

1 [ N ~ 1 \ 
P7V = liminf — | g(Z t , p t +i(h t +i)) > , 

where the subscript n indicates that actions are generated by the policy ir = (^i,^ 2 , • ■ ■)■ The optimal average 
reward is defined by 

p* = sup/^. 



B. The Bellman Equation 

An alternative characterization of the optimal average reward is offered by the Bellman Equation. This equation 
offers a mechanism for verifying that a given level of average reward is optimal. It also leads to a characterization 
of optimal policies. The following result which we will later use encapsulates the Bellman equation and its relation 
to the optimal average reward and optimal policies. 

Theorem 5: If p € 3? and a bounded function /i:Zh| satisfy 



p + h(z) = sup \g{z,u) + / P w {dw\z,u)h{F(z 1 u,w))\ Vz e Z (34) 
ueu \ J J 

then p = p*. Further, if there is a function /i : Z such that p(z) attains the supremum for each z then p^ = p* 
for n = (a*o,M1) • • •) with Ht{h t ) = p(zt-i) for each t. 

This result follows immediately from Theorem 6.2 of [14]. It is convenient to define a dynamic programming 
operator T by 

(Th) (z) = sup I g(z, u) + / P w (dw\z, u)h(F(z, u, w)) 
ueu \ J j 

for all functions h. Then, Bellman's equation can be written as pi + h = Th. It is also useful to define for each 
stationary policy p an operator 

{T li h)(z)=g(z,fi(z)) + J P w (dw\z,p{z))h(F{z,p{z),w)). 

The operators T and obey some well-known properties. First, they are monotonic: for bounded functions h 
and h such that h < h, Th < Th and T u h < T u h. Second, they are non-expansive with respect to the sup-norm: for 
bounded functions h and h, \\ Th — Th\\oo < — /i||oo an d \\T u h — T),/i||oo < \\h — /i||oo- Third, as a consequence 
of nonexpansiveness, T is continuous with respect to the sup-norm. 3 



3 The proof of the properties of T are entirely analogous to the proofs of Propositions 1.2.1 and 1.2.4 in [13, Vol. II] 
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C. Feedback Capacity as a Dynamic Program 



We will now formulate a dynamic program such that the optimal average reward equals the feedback capacity 
of a unifilar channel as presented in Theorem [2 This entails defining the septuple (2,14 , W, F, P z , P w , g) based 
on properties of the unifilar channel and then verifying that the optimal average reward is equal to the capacity of 
the channel. 

Let f3 t denote the | S | -dimensional vector of channel state probabilities given information available to the decoder 
at time t. In particular, each component corresponds to a channel state s t and is given by f3 t (st) — p(st\y t ). We 
take states of the dynamic program to be z t — fit- Hence, the state space Z is the | S | -dimensional unit simplex. 
Each action ut is taken to be the matrix of conditional probabilities of the input xt given the previous state s t -i 
of the channel. Hence, the action space U is the set of stochastic matrices of dimension \S\ x \X\. The disturbance 
wt is taken to be the channel output yt- The disturbance space W is the output alphabet Y. 

The initial state distribution P z is concentrated at the prior distribution of the initial channel state sq. Note 
that the channel state s t is conditionally independent of the past given the previous channel state s ( _i, the input 
probabilities u t , and the current output y t . Hence, /3t(s t ) =p(st|y*) = p(s t \j3 t -i,u t ,y t ). More concretely, given a 
policy 7T = (ni,n 2 > ■ ■ •)> 

Pt(st) = Pistly*) 



E 



p(st,st- i,x t ,yt\y t l ) 
yAv 1 - 1 ) 

p( s t-i|y*~ 1 )p(^|st-i,y*~ 1 )p(ytl s i-i)^)p( s tl s t-i J ^ I yt) 



J?, pivAy 1 ' 1 ) 

•^ti^t— l 



P(yt\y 



t-l^ 



Ex t , St _ x Pt-i^st-ifpix^st-i.y 1 l )p(y t \st-i,xt)p(st\st-i,x t ,yt) 
Ext.st,^-! A-i( s t-i)p(a;t|st-i,2/ t " 1 )p(2/t|st-i,a; t )p(st|st-i,a; t ,y t ) 

^x t , St _! (3t-i(st-i)ut{st-i,x t )p(yt\st-i,xt)l(s t = f(s t -i,x t ,y t )) 
T lXt .s t .s t ^ 1 (3t-i(st-i)ut(st-i,xt)p(yt\st-i,xt)l(s t = f(s t -i,x t , y t )) ' 



(35) 



where l(-) is the indicator function. Note that p(y t \s t -i, xt) is given by the channel model. Hence, (3t is determined 
by /3t—i, Ut, and yt, and therefore, there is a function F such that z t = F(z t -i,Ut,Wt)- 

The distribution of the disturbance u>t is p(wt\z t ~ 1 ,w t ~ 1 ,u t ) = p(wt\zt-i,u t ). Conditional independence from 
z* -2 and w t_1 given Zt-i is due to the fact that the channel output is determined by the previous channel state 
and current input. More concretely, 

pfal^-W-V) = piytlp*- 1 ,^- 1 ^) 



= ^ P(st-i\f3t-i,ut)p(xt\st-i,f3t-i,ut)p{yt\xt,st-i,l3t-i,u t ) 

Xt,S t -l 

= E p( s t-i,x t ,yt\Pt-i,u t ) 



X t ,S t -l 

= p(yt\Pt-i,u t ) 

= p(w t \z t -i,u t ). (36) 
Hence, there is a disturbance distribution P w (-\z t -i, Ut) that depends only on z t ~\ and u t . 

We consider a reward of I(Y t ; X t , St-ily 1 ^ 1 )- Note that the reward depends only on the probabilities 
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p(xt,vt,s t -i\y 



t-i\ 



for all xt, yt and s t -\. Further, 

p(x t ,vu Si-ib* -1 ) = p{st-i\y t ~ l )p(xt\st-i,y t ~ l )p{yt\xt,s t -i) 
= j3 t -i(s t -i)u t (s t -i,x t )p(y t \x t , s t _i). 



(37) 



Recall that p(yt\xt, st-i) is given by the channel model. Hence, the reward depends only on [3t-i and ut- 

Given an initial state z and a policy it = (pi, /i 2l ■ ■ -X w t and /3 t are determined by j/* -1 . Further, (Xj, St_i, It) 
is conditionally independent of y* -1 given /3 t _i as shown in (1371 . Hence, 



g{zt-x,ut) = I(Y t ;X t , St-iW' 1 ) = I(X t , SU; W-i, «*). 
It follows that the optimal average reward is 



(38) 



supliminf — E„ 



JV 



c 



FB- 



The dynamic programming formulation that is presented here is an extension of the formulation presented in 
[10] by Yang, Kavcic and Tatikonda. In [10] the formulation is for channels with the property that the state is 
deterministically determined by the previous inputs and here we allow the state to be determined by the previous 
outputs and inputs. 



VI. Solution for the Trapdoor Channel 

The trapdoor channel presented in Section[|I]is a simple example of a unifilar FSC. In this section, we present an 
explicit solution to the associated dynamic program, which yields the feedback capacity of the trapdoor channel as 
well as an optimal encoder-decoder pair. The analysis begins with a computational study using numerical dynamic 
programming techniques. The results give rise to conjectures about the average reward, the differential value function, 
and an optimal policy. These conjectures are proved to be true through verifying that they satisfy Bellman's equation. 



A. The Dynamic Program 

In Section fV-CI we formulated a class of dynamic programs associated with unifilar channels. From here on we 
will focus on the particular instance from this class that represents the trapdoor channel. 

Using the same notation as in Section IV-CI the state Zt-i would be the vector of channel state probabilities 
[p(st-i — 0\y t ~ 1 ),p(s t -i — l|y* -1 )]. However, to simplify notation, we will consider the state z t to be the first 
component; that is, Zt—x = p(st-i = 0|?/ t_1 ). This comes with no loss of generality - the second component can 
be derived from the first since the pair sum to one. The action is a 2 x 2 stochastic matrix 



at 



P (x t = 0\s t = 0) P (x t = l\s f = 0) 
p(x t = 0|s t = 1) p(x t = ljsf = 1) 



(39) 



The disturbance Wt is the channel output yt- 

The state evolves according to z t = F(z t -i,Ut,Wt), where using relations from eq. (l3l I35i and Table |U we 
obtain the function F explicity as 



Zt-lMt(l.l) 



Zt 



z t _iu t (l,l)+0.5* t -iut(l,2)+0.5(l-z t _i)t* t (2,l) 
0.5(l-zt-iK(2,l)+0.5z t -mt(l,2) 



0.5(l-z t _i)« t (24)+0.5z t _iu t (l : 2) + (l-z t „i)u t (2,2) 

These expressions can be simplified by defining 

7t ^(l-z t -iH(2,2), 



if w t = 



if w t = 1. 



(40) 



S t = z t -iu t (l, 1). 



(41) 
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Note that, given Zt-i, the action u t defines the pair (74, St) and vice-versa. From here on we will represent the action 
in terms of j t and St. Because u t is required to be a stochastic matrix, St and -f t are constrained by < St < z t 
and < 7t < 1 — z t . 

Recall from eq. J38I that the reward function is given by g(z t -i, u t ) = I{X t , St-i; Y t \(3t-i, ut). This reward can 
be computed from the conditional probabilities p(xt, St~i,yt\f3t-i, Ut). Using the expressions for these conditional 
probabilities provided in Table [H] we obtain 



j(z t -i,ut) = I(Xt,St-i;Yt\Pt-i,Ut) 

= H(Y t \u u t -i) - H(Y t \X t , S t -i,l3t-i,ut) 

z t -iu t (l,l) . (1 - z t -i)u t {2, 1) 



H 2ft_iUt(l,l) + 



z t -iut{l,2) - (1 - zt-i)u t (l, 1) 



where, with some abuse of notation, we use H to denote the binary entropy function: H(q) = — q\nq — (1 
?)ln(l-g). 



TABLE II 

The conditional distribution p(x t , s t -i , yt \Pt-i , 



x t 


St-l 


yt = 


= 1 








/3t«t(l,l) 








1 


0.5(1- AK(2,1) 


0.5(1 - AH(2, 1) 


1 





0.5A«t(l,2) 


0.5/3tUt(l,2) 


1 


1 





(i-AK(i,2) 



We now have a dynamic program - the objective is to maximize over all policies 7r the average reward p n . The 
capacity of the trapdoor channel is the maximum of the average reward p* . In the context of the trapdoor channel, 
the dynamic programming operator takes the form 

(Th) W = sup ( H (U S -^l) +S + i-l + l±^h f " ) + l -4^h (l - ) 

o<5< 2 ,o< 7 <i-A V 2 2 / 2 + 2 V 1-^ + 7// 

(42) 

By Theorem |5] if we identify a scalar p and bounded function h that satisfy Bellman's equation, pi + Th = h, 
then p is the optimal average reward. Further, if for each z, T^h = Th then the stationary policy /i is an optimal 
policy. 

B. Computational Study 

We carried out computations to develop an understanding of solutions to Bellman's equation. For this purpose, 
we used the value iteration algorithm, which in our context generates a sequence of iterates according to 

J k+1 - TJ k , (43) 

initialized with Jo = 0. For each k and z, Jk{z) is the maximal expected reward over k time periods given that 
the system starts in state z. Since rewards are positive, for each z, Jk(z) grows with k. For each k, we define a 
differential reward function hk(z) = Jk{z) — Jfc(0). These functions capture differences among values Jk(z) for 
different states x. Under certain conditions such as those presented in [26], the sequence hk converges uniformly to 
a function that solves Bellman's equation. We will neither discuss such conditions nor verify that they hold. Rather, 
we will use the algorithm heuristically in order to develop intuition and conjectures. 

Value iteration as described above cannot be implemented on a computer because it requires storing and updating 
a function with infinite domain and optimizing over an infinite number of actions. To address this, we discretize 
the state and action spaces, approximating the state space using a uniform grid with 2000 points in the unit interval 
and restricting actions 8 and 7 to values in a uniform grid with 4000 points in the unit interval. 
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We executed twenty value iterations. Figure [4]plots the function J20 and actions that maximize the right-hand-side 
of eq. J43i with k = 20. We also simulated the system, selecting actions St and 74 in each time period to maximize 
this expression. This led to an average reward of approximately 0.694. We plot in the right-bottom side of Figure^ 
the relative state frequencies of the associated Markov process. Note that the distribution concentrates around four 
points which are approximately 0.236, 0.382, 0.613, and 0.764. 



Value function on the 20 iteration, J20 

14.2 i . . . . 




0.7 r 
0.6 
0.5 ■ 
0.4 
0.3 
0.2 ■ 
0.1 
0^ 



Action-parameter, S 



0.2 0.4 0.6 0.1 

z 



Action-parameter, 7 




0.5 r 
• 0.4 
< 0.3 

0.2 ■ 

0.1 
- 



Histogram of z 



0.2 0.4 0.6 0.! 

Z 



Fig. 4. Results from 20 value iterations. On the top-left side the value function J20 is plotted. On the top-right and bottom-left the optimal 
action-parameters S and 7 with respect to 20 th iteration are plotted. On the bottom-right the relative state frequencies of the associated Markov 
process of z with the policy that is optimal with respect to J20 is plotted. 



C. Conjectures 

The results obtained from value iteration were, amazingly, close to the answers of two questions given in an 
information theory class at Stanford taught by Professor Thomas Cover. Here is a simplified version of the questions 
given to the class. 

(1) Entropy rate. Find the maximum entropy rate of the two-state Markov chain (Fig. [5} with transition matrix 



P 



1 ~P P 

1 



(44) 



where < p < 1 is the free parameter we maximize over. 




Fig. 5. The Markov chain of question 1. 
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(2) Number of sequences. To first order in the exponent, what is the number of binary sequences of length n 
with no two l's in a row? 

The entropy rate of the Markov chain of question (1) is given by -y^, and when maximizing over < p < 1, 

we get that p = and the entropy rate is 0.6942. It can be shown that the number of sequences of length 

n — 1 that do not have two l's in a row is the n th number in the Fibonacci sequence. This can be proved by 
induction in the following way. Let us denote (iV°, N^) the number of sequences of length n with the condition 
of not having two l's in a row that are ending with '0' and with '1' respectively. For the sequences that end with 
'0' we can either add a next bit '1' or '0' and for the sequences that end with '1' we can add only '0'. Hence 
N® +1 = N® + and N,} 1+1 = N®. By repeating this logic, we get that N® behaves as a Fibonacci sequence. To 
first order in the exponent, the Fibonacci number behaves as lim^oo i log /„ = log = 0.6942, where the 

number, < I s called the golden ratio. The golden ratio is also known to be a positive number that solves the 
equation \ = l — <p, and it appears in many math, science and art problems [27]. As these problems illustrate, the 
number of typical sequences created by the Markov process given in question (1) is, to first order in the exponent, 
equal to the number of binary sequences that do not have two 1 's in a row. 

Let us consider a policy for the dynamic program associated with a binary random process that is created by 
the Markov chain from question 1 (see Fig |5}- Let the state of the Markov process indicate if the input to the 
channel will be the same or different from the state of the channel. In other words, if at time t the binary Markov 
sequence is '0' then the input to the channel is equal to the state of the channel, i.e. Xt = st-i- Otherwise, the 
input to the channel is a complement to the state of the channel, i.e. x t = St-x © 1. This scheme uniquely defines 
the distribution p(x t \s t -i,y t ^ 1 ): 

p(X t = s t -x\s t -x,yt-x) = | x if st _ i ^ (45) 

This distribution is derived from the fact that for the trapdoor channel the state evolves according to equation l|3} 
which can be written as 

s t -i © yt-x = xt-i © «t-2- (46) 

Hence, if St-x Vt-i then necessarily also Xt-x 7^ s t-2- This means that the tuple (st-x,Vt-x) defines the state of 
the Markov chain at time t—1 and the tuple (xt, st-i) defines the state of the Markov chain at time t. Having the 
distribution p(x t \ s t _i, y*- 1 ), for the following four values of z, {h = \fl-2,b 2 = = ^P 1 ,^ = 3-^5}, 

the corresponding actions 7(2:) and S(z) which are defined in eq. ( 14014 1> are: 



z 


l{z) 


6{z) 


bi or 62 




z 


63 or 64 


1 - z 


2 6 



It can be verified, by using eq. J35I . that the only values of z ever reached are, 



(47) 



and the transitions are a function of y t shown graphically in Figure [5] Our goal is to prove that an extension of 
this policy is indeed optimal. Based on the result of Question 1, we conjugate that the entropy rate of the average 
reward is 



1 



3— y/5 
2 



= log- 



0.6942. 



(48) 



It is interesting to notice that all the numbers appearing above can be written in terms of the golden ratio, <j) = 3 1 - 
In particular, p = log tj>, bx = 2<f> — 3, b 2 = 2 — <f>, 63 = <f> — 1 and 64 = 4 — 2<f). 

By inspection of Figure |^ we let 7 and 6 be linear over the intervals [bx, 62], [62, bg], and [63, 64] and we get the 
form presented in Table |lll| 
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Pt-i yt p t 




Fig. 6. The transition between /3t-i and fit, under the policy 5,7. 



z 


7(2;) 


S(z) 


h<z<b 2 


(1 z) 


z 


b% < z < 63 


3-^ 
2 


3-V5 
2 


63 < Z < 64 


1 -z 


2 Z 



TABLE III 

Conjectured policy which in the next section will be proven to be true. 



We now propose differential values h(z) for z S [61, 64]. If we assume that <5 and 7 maximize the right-hand-side 
of the Bellman equation (eq. I34> for z S [61 , 64] with h = h and p = p, we obtain 



A(*)=frQ^ -(V5-2)-p + &(3-V5), b 2 <z<b 3 , (49) 



(Z) = 1 Z J ^-z-p+—£—*h(3-y/5)+ I 1 — ^ J & I i J . ^3 < z < bi. 

4 ( 5 °) 
The equation for the range 61 < z < b 2 is implied by the symmetry relation: h(z) = h(l — z). 

If a scalar p and function h solve Bellman's equation, so do p and h + cl for any scalar c. Therefore, there is 
no loss of generality in setting h(l/2) = 1. From eq. i49i we have that 

h{z) = 1, b 2 <z< b 3 . (51) 

In addition, by symmetry considerations we can deduce that h(yE— 2) = h(3 — \/5) and from eq. ( I49t we obtain 

h(V5- 2) = h(3-V5) =p- 2 + VZk 0.9303. (52) 

Taking symmetry into consideration and applying eq. J50i twice we obtain, 

~h(z) = H(z) + pz + d, b 3 <z<b 4 , (53) 
where ci — log(3 — Vb). By symmetry we obtain 

h(z) = H(z) - pz + c 2 , h<z<b 2 . (54) 

where c 2 = log(^/E — 1) 
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H(z) - pz + c 2 
1 



H(z) + pz + ci 



Action-parameter, 5 




Action-parameter, 7 



✓5-1 



(1-*) 



0.2 



0.4 0.6 

z 



0.8 1 



Fig. 7. A conjecture about the optimal solution based on the 20" 1 value iteration of the DP which is shown in Fig. |4]and on the questions 
given by Professor Cover. On the top-left the conjectured differential value h(z) is plotted for z S [61,64]. On the top-right side and bottom-left 
the conjectured policy (8(z),^(z)) is plotted for 2 € [61,64] 



The conjectured policy (7, 8), which is given in Table Mil and the conjectured differential value h, which is given 
in eq. J5Ti-i54l>. are plotted in Fig. 

D. Verification 

In this section, we verify that the conjectures made in the previous section are correct. Our verification process 
proceeds as follows. First, we establish that if a function h : [0, 1] 1 — > Jft is concave, so is Th. In other words, value 
iteration retains concavity. We then consider a version of value iteration involving an iteration hk+i = Thk — pi. 
Since subtracting a constant does not affect concavity, this iteration also retains concavity. We prove that if a 
function ho is the pointwise maximum among concave functions that are equal to h in the interval [61,64] then 
each iterate hk is also concave and equal to h in this interval. Further, the sequence is pointwise nonincreasing. 
These properties of the sequence imply that it converges to a function h* that again is concave and equal to h in 
the interval [61,64]. This function h* together with p satisfies Bellman's Equation. Given this, Theorem [5] verifies 
our conjectures. 

We begin with a lemma that will be useful in showing that value iteration retains concavity. 
Lemma 6: Let ( : [0, 1] x [0, 1] 1— * 3? be concave on [0, z] x [0, 1 — z] for all z 6 [0, 1] and 

ip(z)= sup C(<5)7)- 

<56[0,z], 7 6[0,l-2] 

Then 1/; : [0, 1] f — » 3ft is concave. 

The proof of Lemma [6] is given in the appendix. 

Lemma 7: The operator T, defined in ( 1421 retains concavity and continuity. Namely, 

• if h is concave then Th is concave, 

• if h is continuous then Th is continuous. 
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Proof (concavity): It is well-known that the binary entropy function H is concave, so the reward function 
is concave in (5, 7). 

Next, we show that if h(z) is concave then 1+ ^~ 7 /t ^ 1+ 2 /_ 7 ^ is concave in (5, 7). Let £1 = 1+<5 ^~ 71 and 
£2 = 1+<5 |~ 72 ■ We will show that, for any a £ (0, 1), 

«** (f ) + a - (|) > «. + a - (^rf ) ■ 

Dividing both sides by (a£i + (1 — a)^) we get 

at , (sa + (1-^)6 ( m > h ^ + . (56) 



Note that the last inequality is true because of the concavity of h. It follows that 



v 2 2 / ' 2 V! + 5 -7 

is concave in (5, 7). Since 

(27i)(*)= sup f(S, 7), 

«£[0,Jt],7£[0,l-«] 

it is concave by Lemma [6] ■ 

Proof (continuity): Note that the binary entropy function H is continuous. Further, h ^ 1+ 2 /_ 7 ^ and 

h ( 1 — 1 _ 2 <5 7 f7 ^ , are continuous over the region {(8,~/)\8 > 0,7 > 0,5 + 7 < 1}. It follows that /(<5, 7) is 
continuous over the region {(5, 7)|(5 > 0, 7 > 0, <5 + 7 < 1}. Hence, 

(Th)(z)= sup /(<5, 7) 

5E[0,z],7e[0,l-a] 

is continuous over [0, 1]. ■ 

Let us construct value iteration function hk{z) as follows. Let ho(z) be the pointwise maximum among concave 
functions satisfying ho(z) = h(z) for z £ [bi, 64], where h(z) is defined in ea.( l51> -( l54l . Note that ho(z) is concave 
and that for z £ [bi, 64], h(z) is a linear extrapolation from the boundary of [b±, 64]. Let 

h k+1 (z) = {Th k )(z) - p, (58) 

and 

h*{z) =limsup/i fe (z). (59) 

A: — >oo 

The following lemma shows several properties of the sequence of function hk(z) including the uniform 
convergence. The uniform convergence is needed for verifying the conjecture, while the other properties are 
intermediate steps in proving the uniform convergence. 

Lemma 8: The following properties hold: 

|8]l for all k > 0, hk(z) is concave and continuous in z 
|8]2 for all k > 0, hk(z) is symmetric around |, i.e. 

h k {z)=h k (l-z) (60) 

[8]3 for all k > 0, 7ifc(z) is fixed point for z £ [61, 64], i.e., 

/j fe (z) = z£ [61,64], (61) 

and the stationary policy = (<5(z), 7(2)), where 7(2)) are defined in Table Mil satisfies (T^hk){z) = 

(Th k )(z) 
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E]4 h k (z) is uniformly bounded in k and z, i.e., 

sup sup |/ife(^)| < 00 (62) 

fc 2£[0,1] 

|8]5 hk(z) is monotonically nonincreasing in fc, i.e. 

lim /i fe (z) = h*(z) (63) 

— >oo 

[S]6 ^fe(z) converges uniformly to h*(z) 

Proof o/IUl: Since /lo(^) is concave and continuous and since the operator T retain continuity and concavity 
(see Lemma 0,it follows that hk{z) is concave and continuous for every k. ■ 

Proof o/[8]2: We prove this property by induction. First notice that h (z) is symmetric and satisfies h (z) = 
ho(l — z). Now let us show that if it holds for hk then it holds for h k+ \. 

Let fk(8, 7) denote the expression maximized to obtain (Thk)(z), i.e. 

/i({ , 7) ^Q + ^) +{+7 _ 1 + l±ti ftl (^) + ^ 4 ( 1 __^_). (64) 

Notice that 7) = /fc(7, 5). Also observe that replacing the argument 2 with 1 — z in T/i^ yield the same result 
as exchanging between 7 and <5. From those two observations follows that Thk(z) = Th k (l — z) and from the 
definition of h k+ i given in ( I58> follows that hk+i{z) = h k+ i(l — z). ■ 
Proof of ^3: We prove this property by induction. Notice that ho satisfies Hq(z) = h(z) for z £ [61,64]. We 
assume that hk satisfies hk{z) = h(z) and then we will prove the property for hk+x- We will show later in this 
proof that for z £ [61,64], 

{T ll h k ){z) = {Th k )(z). (65) 



Since {T^h k ){z) — p = /i(z) for all z € [61, 64] (see eg 1491541 ) it follows that h k+ i{z) = h(z) for all z £ [61, 64]. 

Now, let us show that d65l > holds. Recall that in the proof of Lemma eq- J64b . we showed that f k (8,j) is 
concave in (8, 7). The derivative with respect to 8 is, 



88 2 1 + 5 - 7 2 \1 + S-jJ 2 \1-S 



I+5-7 \l+8-^J 1-5+7 \l-S+j 
The derivative with respect to 7 is entirely analogous and can be obtained by mutually exchanging 7 and 8. 
For z £ [62,63!, the action 7(2 ) = <5(z) = 3 ~ v ^ is feasible and — ^ 2 ^ z - ) - ; = — ? S [ Z ^ , , = 64. Moreover, it 

* l-o(z)+7(z) l+<5(z)— 7(2) 

is straightforward to check that the derivatives of f k are zero at (7(21), <5(z)), and since f k is concave, (j(z),8(z)) 
attains the maximum. Hence, {T^h){z) — (Th)(z) for z £ [62,63]. 

For z £ [63, 64], 7(z) - 1 - z and <5(z) = &±z. Note that ^^gg^ and 1+ ff } % {z] are in [61, 6 2 ] U [63, 64]. 

Using expressions for h(z) given in equations j53t and ( I54l i. we can write derivatives of / at (8(z), ; y(z)) as 

^. los Mihfi tl + ^„, (67) 

2<J(z) 

aaa^hteatH^,, (68) 

07 27(2:) 

Notice that 7(2) is the maximum of the feasible set [0, 1 — z] and the derivative of f k with respect to 7 at (8(z), 7(2)) 
is positive. In addition, <5(z) is in the interior of the feasible set [0, z\ and the derivative of f k with respect to 8 at 
(8(z), 7(2)) is zero. Since is concave, any feasible change in (7(2), 5(z)) will decrease the value of the function. 
Hence, {T^h k )(z) = (Th k )(z) for z £ [63,64]. The situation for z £ [61,62] is completely analogous. ■ 

Proof of^A: From Propositions [8]l-[8]3, it follows that the maximum over z of h k (z) is attained at z = 1/2 
and h k (l/2) = 1 for all k. Further more because of concavity and symmetry the minimumm of h k (z) is attained 
at z = and z = 1. Hence it is enough to show that h k (0) is uniformly bounded from below for all k. 

For z = let us consider the action 7 = an ^ 8 = and for 61 < z < 64 the action 7(2), <5(z). Now let us 
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prove that under this policy h k (0) that is less or equal the optimal value is uniformly bounded. 
Under this policy h k +i(0) — (Th k )(Q) — p becomes 

h k (0)=c + ah k -i(Q) + (l-a)l-p (69) 

where c and a are constant: c = H (j^p^l + b2 — 1 , a = 
Iterating the equation ( I69l > k — 1 time we get 

k-l 

h k (0) = ( c +l-a-p)J2 ai + a k h o {0). (70) 

i=0 

Since a < 1, ft-fc(O) is uniformly bounded for all k. ■ 

fVoo/o/|8]5: By Proposition [8] 1 /ifc is concave for each k and by Proposition [8]3 /ifc(-z) = h(z) for z € [61,64]. 
Since ho is the pointwise maximum of functions satisfying this condition, we must have ho > hi. It is easy to 
see that T is a monotonic operator. As such, h k > h k +\ for all k. Proposition [8]4 establishes that the sequence is 
bounded below, and therefore it converges pointwise. ■ 

Proof o/[8]6: By Proposition [8J1, each h k is concave and continuous. Further, by Proposition [8]5, the sequence 
has a pointwise limit h* which is concave. Concavity of h* implies continuity [28, Theorem 10.1] over (0, 1). Let 
tv be the continuous extension of h* from (0, 1) to [0, 1]. Since h* is concave, W > h* . 

By Proposition [8]5 , h k > h*. It follows from continuity of h k that h k > W . Hence, h*(z) = lim k h k (z) > h^(z) 
for z £ [0, 1]. Recalling that h* < h\ we have h* = h^ . 

Since the iterates h k are continuous and monotonically nonincreasing and their pointwise limit h* is continuous, 
h k converges uniformly by Dini's Theorem [29]. ■ 

The following theorem verifies our conjectures. 

Theorem 9: The function h* and scalar p satisfy pi + h* — Th*. Further, p is the optimal average reward and 
there is an optimal policy that takes actions S t — S(z t -i) and j t = j(z t -i) whenever z t _i G [61,64]. 

Proof: Since the sequence h k+ i = Th k — pi converges uniformly and T is sup-norm continuous, h* = 
Th* — pi. It follows from Theorem [5] that p is the optimal average reward. Together with Proposition |8]3, this 
implies existence of an optimal policy that takes actions S t = 5(z t -\) and j t = j(z t -i) whenever Zt-i £ [61,64]. 



VII. A Capacity-Achieving Scheme 

In this section we describe a simple encoder and decoder pair that provides error-free communication through the 
trapdoor channel with feedback and known initial state. We then show that the rates achievable with this encoding 
scheme are arbitrarily close to capacity. 

It will be helpful to discuss the input and output of the channel in different terms. Recall that the state of the 
channel is known to the transmitter because it is a deterministic function of the previous state, input, and output, 
and the initial state is known. Let the input action, x, be one of the following: 

- _ / 0, input ball is same as state 
1 1, input ball is opposite of state 

Also let the output be recorded differentially as, 

_ _ J 0, received ball is same as previous 
1 1, received ball is opposite of previous 

where yi is undefined and irrelevant for our scheme. 



A. Encode/Decode Scheme 

Encoding. Each message is mapped to a unique binary sequence of N actions, x n , that ends with and has no 
occurrences of two l's in a row. The input to the channel is derived from the action and the state as, x k — x k @s k _i. 



Decoding. The channel outputs are recorded differentially as, y k = y k © y k -\, for k — 2, TV. Decoding of the 
action sequence is accomplished in reverse order, beginning with xjy = by construction. 
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TABLE IV 

Decoding the input from the next output and input. 





Vk+i 






Case 1 










Case 2 




1 





Case 3 


1 





1 



Lemma 10: If Xk+i is known to the decoder, Xk can be correctly decoded. 
Proof: Table [TV1 shows how to decode Xf. from ik+i and yt+x- 

Proof of case 1. Assume that Xk = 1. At time k, just before the output is received, there are balls of both types 
in the channel. By symmetry we can assume that the ball that exits is labeled '0.' Therefore, the ball labeled '1' 
remains in the channel. According to the encoding scheme, Xk+x — because repeated l's are not allowed, which 
means the input to the channel at time k is labeled '1.' It is clear that the ball that comes out of the channel at 
time k + 1 must be labeled '1.' This leads to the contradiction, j)k+i = 1. 

Proof of case 2. By construction there are never two 1 's in a row. 

Proof of case 3. Assume that x k = 0. The balls that enter the channel both at times k and fc + 1 are the same 
type as the ball that is in the channel, therefore that same type of ball must come out each of the two times. This 
leads to the contradiction, tjk+x = 0. 

■ 

Decoding example. Table IVl shows an example of decoding a sequence of actions for N = 10. 



TABLE V 
Decoding Example 



Variable 


Value 


Reason 


Vn 


1011010001 


Channel output 


Vn 


nioinooi 


Differential output 


Xn 





Given 




10 


Case 3 




010 


Case 1 or 2 




0010 


Case 1 




10010 


Case 3 




010010 


Case 2 




1010010 


Case 3 




01010010 


Case 1 or 2 




101010010 


Case 3 




0101010010 


Case 2 



B. Rate 

Under this encoding scheme, the number of admissible unique action sequences is the number of binary sequences 
of length N — 1 without any repeating l's. This is known to be exponentially equivalent to (f) 1 ^^ 1 , where (f> is the 
golden ratio (see question 2 in section fVI-O . Since lirrijv->oo log</> = log</>, rates arbitrary close to \og(f> are 
achievable. 

C. Remarks 

Early decoding. Decoding can often begin before the entire block is received. Table I1VI shows us that we can 
decode xu without knowledge of Xk+x for any k such that yk+x = 0. Decoding can begin from any such point 
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and work backward. 

Preparing the channel. This communication scheme can still be implemented even if the initial state of the channel 
is not known as long as some channel uses are expended to prepare the channel for communication. The repeating 
sequence 010101... can be used to flush the channel until the state becomes evident. As soon as the output of the 
channel is different from the input, both the transmitter (through feedback) and the receiver know that the state is 
the previous input. At that point, zero-error communication can begin as described above. 

This flushing method requires a random and unbounded number of channel uses. However, it only needs to 
be performed once after which multiple blocks of communication can be accomplished. The expected number 
of required channel uses is easily found to be 3.5, since the number of uses is geometrically distributed when 
conditioned on the initial state. 

Permuting relay channel similarity. The permuting relay channel described in [3] has the same capacity as the 
trapdoor channel with feedback. A connection can be made using the achievable scheme described in this section. 

The permuting relay channel supposes that the transmitter chooses an input distribution to the channel that is 
independent of the message to be sent. The transmitter lives inside the trapdoor channel and chooses which of the 
two balls will be released to the receiver in order to send the message. Without proof here, let us assume that the 
deterministic input 010101... is optimal. Now we count how many distinguishable outputs are possible. 

It is helpful to view this as a permutation channel as described in section [H] where the permuting is not done 
randomly but deliberately. Notice that for this input sequence, after each time that a pair of different numbers 
is permuted, the next pair of numbers will be the same, and the associated action will have no consequence. 
Therefore, the number of distinguishable permutations can be easily shown to be related to the number of unique 
binary sequences without two l's in a row. 

Three channels have same feedback capacity. The achievable scheme in this section allows zero-error communi- 
cation. Therefore, this scheme could also be used to communicate with feedback through the permuting jammer 
channel from [3], which assumes that the trapdoor channel behavior is not random but is the worst possible to 
make communication difficult. 

In the permuting relay channel [3], all information (input and output) is available to the transmitter, so feedback 
is irrelevant. Thus we find that the feedback capacity (with known initial state) is the same for the trapdoor, 
permuting jammer, and permuting relay channels. 

Constrained coding. The capacity-achieving scheme requires uniquely mapping a message to a sequence with the 
constraint of having no two 1 's in a row. A practical way of accomplishing this can be done by using a technique 
called enumeration [30]. The technique translates the message into codewords and vice versa by invoking an 
algorithmic procedure rather then using a lookup table. Vast literature on coding a source word into a constrained 
sequence can be found in [31] and [32]. 



VIII. Conclusion and Further Work 

This paper gives an information theory formulation for the feedback capacity of a strongly connected unifilar 
finite state channel and it shows that the feedback capacity expression can be formulated as an average -reward 
dynamic program. For the trapdoor channel, we were able to solve explicitly the dynamic programming problem 
and to show that the capacity of the channel is the log of the golden ratio. Furthermore, we were able to find a 
simple encoding/decoding scheme that achieves this capacity. 

There are several directions in which this work can be extended. 

• Generalization: Extend the trapdoor channel definition. It is possible to add parameters to the channel and 
make it more general. For instance, there could be a parameter that determines which ball from the two has 
the higher probability of being the output of the channel. Other parameters might include the number of balls 
that can be in the channel at the same time or the number of different types of balls that are used. These tie 
in nicely with viewing the trapdoor channel as a chemical channel. 

• Unifilar FSC Problems: Find strongly connected unifilar FSC's that can be solved, similar to the way we 
solved the trapdoor channel. 
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• Dynamic Programming: Classify a family of average-reward dynamic programs that have analytic solutions. 
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Appendix 

Proof of Lemma® For any zi,z 2 £ [0, 1] and 9 E (0, 1), 

i\){Bz\ + (1 - 6)z 2 ) = sup sup C(£>7) 

5£[0,6zi+(l-e)z 2 ] 7G[0,l-(ejri+(l-9)*a)] 

= sup sup sup sup ((Si +<5 2 ,7i +72) 

«ie[o,e»i] 5 2 e[o,(i-e)z 2 ] 7i£[o,0(i-*i)] 72e[o,(i-e)(i-* 2 )] 



(a) 



sup sup sup sup C(95[ + (1 - 0)4, ^ + (1 - 0)72) 
<^e[o,zi] 6' 2 e[o,z 2 ] 7{e[o,i-z 1 ]7 2 e[o,i-z2] 



(6) 

> sup sup sup sup 0(^(S 1 ,'Y- L ) + (1 — 9)((S 2 ,J 2 ) 
«£e[o,* 1 ]«£e[o,* 2 ]7{e[o,i-«i]7£e[o,i-* 2 ] 

sup sup #C(4,7i) + sup sup (1 - 6»)C(4,72) 

Si£[0,j(i] 7j£[0,l-ai] <5 2 e[0,z 2 ] 7 2 6[0,l-z 2 ] 

= ^fa) + (1 - 0)V>(*a). (71) 

Step (a) is a change of variable (0^ = Si, (1 — 0)4 — ^27 07i = 7i, (1 — 0)72 = 72)- Step (b) is due to concavity 
of C ■ 



